Comparing Similarity Measures for Distributional Thesauri

نویسندگان

  • Muntsa Padró
  • Marco Idiart
  • Aline Villavicencio
  • Carlos Ramisch
چکیده

Distributional thesauri have been applied for a variety of tasks involving semantic relatedness. In this paper, we investigate the impact of three parameters: similarity measures, frequency thresholds and association scores. We focus on the robustness and stability of the resulting thesauri, measuring inter-thesaurus agreement when testing different parameter values. The results obtained show that low-frequency thresholds affect thesaurus quality more than similarity measures, with more agreement found for increasing thresholds. These results indicate the sensitivity of distributional thesauri to frequency. Nonetheless, the observed differences do not transpose over extrinsic evaluation using TOEFL-like questions. While this may be specific to the task, we argue that a careful examination of the stability of distributional resources prior to application is needed.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Nothing like Good Old Frequency: Studying Context Filters for Distributional Thesauri

Much attention has been given to the impact of informativeness and similarity measures on distributional thesauri. We investigate the effects of context filters on thesaurus quality and propose the use of cooccurrence frequency as a simple and inexpensive criterion. For evaluation, we measure thesaurus agreement with WordNet and performance in answering TOEFL-like questions. Results illustrate ...

متن کامل

B2SG: a TOEFL-like Task for Portuguese

Resources such as WordNet are useful for NLP applications, but their manual construction consumes time and personnel, and frequently results in low coverage. One alternative is the automatic construction of large resources from corpora like distributional thesauri, containing semantically associated words. However, as they may contain noise, there is a strong need for automatic ways of evaluati...

متن کامل

Learning Thesaurus Relations from Distributional Features

In distributional semantics words are represented by aggregated context features. The similarity of words can be computed by comparing their feature vectors. Thus, we can predict whether two words are synonymous or similar with respect to some other semantic relation. We will show on six different datasets of pairs of similar and non-similar words that a supervised learning algorithm on feature...

متن کامل

Discovering Distributional Thesauri Semantic Relations

The paper presents technique and analysis to discover distributional thesauri relations by using statistical similarity of different word’s contexts. The application uses educational electronic text corpus and the Sketch Engine software statistical search to extract and compare word’s collocations from the related text corpus. The semantic search used is based on the evaluation and comparison o...

متن کامل

Identifying Bad Semantic Neighbors for Improving Distributional Thesauri

Distributional thesauri are now widely used in a large number of Natural Language Processing tasks. However, they are far from containing only interesting semantic relations. As a consequence, improving such thesaurus is an important issue that is mainly tackled indirectly through the improvement of semantic similarity measures. In this article, we propose a more direct approach focusing on the...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2014